LCORE-1497: Fix RHOAI Prow e2e pipeline failures by are-ces · Pull Request #1613 · lightspeed-core/lightspeed-stack

are-ces · 2026-04-28T12:35:38Z

Description

Fixes the RHOAI inference provider Prow e2e pipeline which was failing due to multiple cascading issues: RHOAI image version bump, in-cluster image build errors, namespace handling, port-forward instability, secret ordering, Llama Stack disruption cascade, and missing RAG/enrichment configuration.

Key changes:

Bump RHOAI image version in Prow tests
Create a dedicated llama-stack-prow.yaml manifest with enrichment + RAG restore init containers
Fix pipeline.sh: in-cluster image build, bootstrap ordering, namespace hardcoding, secret exports, config paths, model/provider override env vars
Fix e2e-ops.sh: restart failures, mock-jwks port-forward, configmap cascade, port-forward resilience
Fix verify_connectivity for auth-enabled Prow environments
Skip TLS and proxy e2e tests in Prow (no Docker Compose services available)
Remove namespace from cluster-scoped DataScienceCluster CR

Type of change

Bug fix
End to end tests improvement
CI configuration change

Tools used to create PR

Assisted-by: Claude Code
Generated by: Claude Opus 4.6

Related Tickets & Documents

Related Issue: LCORE-1497

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Verify here

Summary by CodeRabbit

Improvements
- vLLM runtime now supports much larger contexts (max length 32,768 tokens) for longer inputs.
- Llama Stack deployment and service startup made more robust with improved initialization, health checks, and fallback behavior.
- Pipeline and deploy scripts now honor a configurable namespace and improve secret/port-forward handling and reliability.
Tests
- CI test runner skips select scenarios in Prow and adds stronger readiness checks and port-forward lifecycle management.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Build llama-stack image in OpenShift internal registry via oc new-build/start-build - Add image-puller role for default SA to pull from internal registry - Add FAISS_VECTOR_STORE_ID and KV_RAG_PATH env vars to lightspeed-stack pod - Add inference, byok_rag, and rag sections to prow lightspeed-stack configs - Use envsubst with specific variable scoping in pipeline-services.sh - Fix free_local_tcp_port to only kill LISTEN sockets (was killing behave process) - Add MCP token secrets and empty OpenAI secret to pipeline.sh - Add rlsapi_v1_infer action to prow RBAC config - Simplify llama-stack.yaml to use pre-built image Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add DEBUG NS checkpoints to trace when e2e-rhoai-dsc namespace disappears during operator bootstrapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The RHOAI operator deletes the e2e-rhoai-dsc namespace during DSC reconciliation. Reorder pipeline to run operator bootstrap first, then create namespace and secrets after DSC settles. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename llama-stack.yaml to llama-stack-prow.yaml and add: - Config enrichment via llama_stack_configuration.py - restore_rag_seed() to re-inflate RAG db after enrichment - PYTHONPATH, lightspeed-stack.yaml mount, rag-data mount - materialize-run-yaml init container - Model/provider overrides in inline_rag e2e tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Create llama-stack-ip-secret before deploying the pod to fix chicken-and-egg dependency where the pod requires the secret as a non-optional env var - Add LLAMA_STACK_CONFIG env var pointing to the correct emptyDir mount path where materialize-run-yaml init container places run.yaml - Use make test-e2e-local instead of test-e2e to avoid macOS-incompatible script -c flag - Remove DEBUG NS echo lines from pipeline scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace unfiltered envsubst with sed in e2e-ops.sh restart commands to prevent blanking $VAR references in embedded bash scripts - Add mock-jwks port-forward management (kill/restart/health check) so RBAC and MCP tests don't fail with connection refused on :8000 - Restart mock-jwks port-forward as part of lightspeed restart - Increase vLLM max-model-len from 2048 to 32768 to avoid context length errors with RAG queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verify_connectivity now checks /v1/models returns 200 (not just /readiness) to ensure the app is fully initialized before declaring success. before_scenario in the test framework probes the port-forward before each scenario and auto-restarts it via e2e-ops if dead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace fragile oc delete + oc create with oc create --dry-run | oc apply so a failed update leaves the ConfigMap intact instead of deleted. The old approach caused 156 errored scenarios: if create failed after delete succeeded, the ConfigMap was gone and every subsequent update also failed. Also print stdout/stderr from e2e-ops on failure so the actual oc error is visible in test logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On Prow, both /readiness and /v1/models return 401 when auth is enabled. The previous fix only accepted 200 from /v1/models, causing connectivity checks to always fail and port-forward to be declared dead. Accept 401 as valid — it proves the full app stack is running, not just the socket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion Llama Stack disruption tests left the pod dead after the feature because Behave clears custom context attributes between scenarios, so after_feature never saw llama_stack_was_running=True. This caused 59+ subsequent scenarios to cascade-fail with Connection refused. Three fixes: - Store was_running in module-level state (survives Behave context resets) so after_feature reliably triggers _restore_llama_stack - Add restart-lightspeed fallback in before_scenario when port-forward alone fails (recovers from dead pods, not just dead tunnels) - Align pipeline.sh with pipeline-konflux.sh: export PID file paths for e2e-ops.sh, start Llama Stack port-forward on :8321, and use lsof/fuser fallback for port cleanup in minimal images Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TLS and proxy features depend on mock-tls-inference and proxy sidecars that are only deployed via Docker Compose, not in the OpenShift cluster. Every TLS scenario burned 200s waiting for a provider that never exists, consuming ~63 min of the 4h Prow timeout for guaranteed failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-28T12:36:00Z

Walkthrough

Refactors e2e Prow manifests and orchestration: adds a new Prow-specific Llama Stack Pod with init containers for RAG/config materialization, removes the old manifest, bumps vLLM max model length, adds namespace/image handling and port‑forward management in pipeline scripts, and introduces Prow-specific test gating and recovery logic.

Changes

Cohort / File(s)	Summary
Llama Stack Manifests `tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml`, `tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml`	Adds new Prow Pod manifest with init containers to extract/decompress RAG KV store and materialize base `run.yaml`; main container conditionally enriches `run.yaml` then runs `llama stack run`. Old `llama-stack.yaml` deleted.
Lightspeed Manifest `tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml`	Updates container env by adding `KV_RAG_PATH` variable (and removes a descriptive comment).
Operator Manifest `tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml`	Removes explicit `metadata.namespace` from DataScienceCluster manifest.
vLLM Runtime Manifests `tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml`, `tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml`	Increases vLLM `--max-model-len` from `2048` to `32768`.
Pipeline Scripts `tests/e2e-prow/rhoai/pipeline.sh`, `tests/e2e-prow/rhoai/pipeline-services.sh`, `tests/e2e-prow/rhoai/scripts/bootstrap.sh`	Adds configurable `NAMESPACE`, builds/tags a llama-stack image via BuildConfig, reorders secret creation/exposure, makes manifest templating more conservative, and adds cluster provisioning introspection.
E2E Ops & Utilities `tests/e2e-prow/rhoai/scripts/e2e-ops.sh`, `tests/e2e/utils/prow_utils.py`	Adds JWKS port-forward lifecycle and PID handling, strengthens readiness checks, replaces some `envsubst` usage with `sed`, and uses `oc apply` (dry-run+apply) for configmap updates with improved error logging.
Test Framework Changes `tests/e2e/features/environment.py`, `tests/e2e/features/steps/health.py`, `tests/e2e/features/steps/common.py`	Adds module-level Llama Stack endpoint and "was running" tracking to persist across scenarios, Prow-specific gating/recovery in `before_scenario`, and helper accessors for hostname/port/state.
Feature Tagging `tests/e2e/features/proxy.feature`, `tests/e2e/features/tls.feature`	Adds `@skip-in-prow` tag to skip those features in Prow runs.

Sequence Diagram(s)

sequenceDiagram
    participant Init as Init Container (RAG/Config)
    participant CM as ConfigMap (rag-data / llama-stack-config)
    participant Main as Main Container (llama-stack)
    participant LCS as Lightspeed Service
    participant Health as Health Probe (/v1/health)

    CM->>Init: mount rag-data, llama-stack-config
    Init->>Init: extract/decompress RAG KV (kv_store.db)
    Init->>Init: materialize base run.yaml
    Init-->>Main: signal ready

    Main->>Main: prefer mounted gz seed else use extracted seed
    Main->>Main: check for lightspeed-stack.yaml
    alt lightspeed-stack.yaml present
        Main->>Main: run llama_stack_configuration.py -> enriched run.yaml + .env
        Main->>Main: source .env if present
        Main->>Main: use enriched run.yaml
    else
        Main->>Main: fallback to original run.yaml
    end

    Main->>LCS: launch (llama stack run)
    LCS->>Health: bind and expose /v1/health on :8321
    Health-->>Main: readiness probe succeeds

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

LCORE-1614: Fix tests on konflux #1458 — Modifies e2e-prow manifests and env vars for Lightspeed/Llama Stack (overlapping manifest changes).
LCORE-948: Restart prow e2e pod #1181 — Updates Prow e2e tooling, scripts, and manifest handling in the same test directories.
LCORE-1221: Added tekton file for konflux e2e test run #1403 — Changes related e2e-prow manifests and pipeline scripts for Llama/Lightspeed deployment.

Suggested labels

Review effort 3/5

Suggested reviewers

tisnik
radofuchs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly references the ticket number (LCORE-1497) and accurately describes the main objective: fixing RHOAI Prow e2e pipeline failures. The title is specific, concise, and clearly conveys the primary purpose of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/e2e/features/steps/health.py (1)
68-95: ⚠️ Potential issue | 🟠 Major

Check Docker running state explicitly against "true" before stopping.

Line 92 currently uses if result.stdout.strip(): which treats any non-empty string output as truthy. Since docker inspect -f "{{.State.Running}}" returns the string literal "true" or "false", this causes the conditional to incorrectly evaluate "false" as truthy, leading to wrongly stopping a non-running container and incorrectly setting was_running = True.
Proposed fix
-        if result.stdout.strip():
+        if result.stdout.strip().lower() == "true":
             context.llama_stack_was_running = True
             _llama_stack_was_running["value"] = True
             subprocess.run(
                 ["docker", "stop", "llama-stack"], check=True, capture_output=True
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/health.py` around lines 68 - 95, The Docker
running-state check uses if result.stdout.strip(): which treats "false" as
truthy; change the condition to explicitly compare the trimmed output to "true"
(e.g. result.stdout.strip().lower() == "true") when setting
context.llama_stack_was_running and _llama_stack_was_running["value"] and before
calling subprocess.run to stop the container so non-running containers aren't
stopped and was_running is set correctly.
tests/e2e-prow/rhoai/pipeline.sh (1)
390-421: ⚠️ Potential issue | 🟠 Major

Don’t require unauthenticated 200 from /v1/models.

In auth-enabled Prow, Lightspeed is healthy when /v1/models returns 401, but this loop uses curl -f and will spin for three minutes before failing anyway. Please align the initial wait with verify_connectivity() in tests/e2e-prow/rhoai/scripts/e2e-ops.sh by accepting 200 or 401 after the forward comes up, or by probing /readiness first.
Suggested fix
 for i in $(seq 1 36); do
-  if curl -sf http://localhost:8080/v1/models > /dev/null 2>&1; then
+  http_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 \
+    http://localhost:8080/v1/models 2>/dev/null || echo 000)
+  if [[ "$http_code" == "200" || "$http_code" == "401" ]]; then
     echo "✅ Port-forward ready after $(( i * 5 ))s"
     break
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e-prow/rhoai/pipeline.sh` around lines 390 - 421, The readiness loop
currently requires an unauthenticated 200 from /v1/models; update the check in
the loop that uses PF_LCS_PID to accept either HTTP 200 or 401 (unauthorized) as
success (or alternatively probe /readiness first) so auth-enabled clusters don't
wait the full timeout; modify the curl invocation logic to capture HTTP status
and treat status 200 or 401 as success, keeping the existing restart-on-death
behavior that inspects PF_LCS_PID and preserving the debug/logging and kill/exit
flow on final timeout to mirror verify_connectivity() behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml`:
- Around line 33-41: The init script is seeding under /data/src/.llama/storage
which ends up nested in the main container; instead seed the shared mount root
(/data) so the main container's KV_RAG_PATH sees the files. Update the mkdir,
gunzip/cp and chmod invocations to create and write to /data/.llama/storage/rag
and /data/.llama/storage/files (not /data/src/...), e.g. ensure the script uses
/rag-data/kv_store.db.gz -> /data/.e2e-rag-seed/kv_store.db and then copies that
file to /data/.llama/storage/rag/kv_store.db, and adjust the final chmod to
operate on /data and /data/.e2e-rag-seed so the prepopulated path matches
KV_RAG_PATH.

In `@tests/e2e-prow/rhoai/pipeline-services.sh`:
- Around line 4-30: Add fail-fast behavior by inserting set -euo pipefail near
the top (immediately after the NAMESPACE assignment) so any failing command
(envsubst | oc apply, oc wait pod/llama-stack-service, oc expose pod
llama-stack-service, etc.) aborts the script; also stop swallowing errors on
secret creation—replace the silent redirection and echo around oc create secret
llama-stack-ip-secret with a check that only ignores "already exists" (e.g., try
oc create and on failure verify oc get secret llama-stack-ip-secret returns
success, otherwise exit) so unexpected failures surface immediately.

In `@tests/e2e-prow/rhoai/pipeline.sh`:
- Around line 359-369: The port-cleanup block only tries lsof and fuser and
needs the /proc fallback used in tests/e2e-prow/rhoai/scripts/e2e-ops.sh to
catch stale oc port-forward listeners; update the cleanup in pipeline.sh (the
block that checks command -v lsof / command -v fuser and kills ports 8080, 8000,
8321) to add the /proc-based scan as the final elif/else branch that iterates
/proc/*/fd, resolves socket links to find LISTEN sockets bound to those ports
and kills the owning PIDs, mirroring the logic from e2e-ops.sh so binds won’t
fail when lsof/fuser are absent.
- Around line 377-388: The JWKS port-forward started into PF_JWKS_PID (and
persisted to E2E_JWKS_PORT_FORWARD_PID_FILE) is not validated before tests run;
add a readiness check after spawning oc port-forward svc/mock-jwks by polling
localhost:8000/tokens (or invoking the existing restart-jwks-port-forward
helper) until it returns success or a timeout, and only proceed once /tokens is
reachable; ensure the health loop uses PF_JWKS_PID to detect if the background
process died and call restart-jwks-port-forward if needed so the RBAC tests
won't start against an unavailable JWKS endpoint.

In `@tests/e2e-prow/rhoai/scripts/bootstrap.sh`:
- Around line 95-96: The script contains two consecutive sleep commands ("sleep
5" followed by "sleep 10") which should be collapsed into a single sleep to
improve clarity; replace the two lines with one "sleep 15" in the bootstrap.sh
script so the total delay is preserved while simplifying the code.

In `@tests/e2e-prow/rhoai/scripts/e2e-ops.sh`:
- Around line 505-544: The health checks treat any non-"000" curl result as
healthy; change the logic that examines the curl http_code (the two occurrences
that compare http_code != "000" when checking
"http://127.0.0.1:$local_port/tokens") to require a successful response (e.g.
http_code == "200" or at minimum a 2xx check) before claiming the JWKS
port-forward is healthy and before writing pf_pid to
E2E_JWKS_PORT_FORWARD_PID_FILE; adjust the checks near the variables
E2E_JWKS_PORT_FORWARD_PID_FILE, jwks_pf_log, pf_pid and the curl invocations so
only a true success (200/2xx) allows early return or saving the PID.

In `@tests/e2e/utils/prow_utils.py`:
- Around line 208-209: Replace the two print calls used for update-configmap
diagnostics with the module logger: add "logger = get_logger(__name__)"
(importing get_logger from log.py) near the top of prow_utils.py, then change
the prints that reference result.stdout and result.stderr to structured logger
calls (e.g., logger.debug or logger.info with format placeholders) so the
messages use the module logger and pass result.stdout/result.stderr as
parameters instead of using print.

---

Outside diff comments:
In `@tests/e2e-prow/rhoai/pipeline.sh`:
- Around line 390-421: The readiness loop currently requires an unauthenticated
200 from /v1/models; update the check in the loop that uses PF_LCS_PID to accept
either HTTP 200 or 401 (unauthorized) as success (or alternatively probe
/readiness first) so auth-enabled clusters don't wait the full timeout; modify
the curl invocation logic to capture HTTP status and treat status 200 or 401 as
success, keeping the existing restart-on-death behavior that inspects PF_LCS_PID
and preserving the debug/logging and kill/exit flow on final timeout to mirror
verify_connectivity() behavior.

In `@tests/e2e/features/steps/health.py`:
- Around line 68-95: The Docker running-state check uses if
result.stdout.strip(): which treats "false" as truthy; change the condition to
explicitly compare the trimmed output to "true" (e.g.
result.stdout.strip().lower() == "true") when setting
context.llama_stack_was_running and _llama_stack_was_running["value"] and before
calling subprocess.run to stop the container so non-running containers aren't
stopped and was_running is set correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c48d2bc8-9576-4364-947c-545202f5ad32

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea1e91 and b033f92.

📒 Files selected for processing (15)

tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml
tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml
tests/e2e-prow/rhoai/pipeline-services.sh
tests/e2e-prow/rhoai/pipeline.sh
tests/e2e-prow/rhoai/scripts/bootstrap.sh
tests/e2e-prow/rhoai/scripts/e2e-ops.sh
tests/e2e/features/environment.py
tests/e2e/features/proxy.feature
tests/e2e/features/steps/health.py
tests/e2e/features/tls.feature
tests/e2e/utils/prow_utils.py

💤 Files with no reviewable changes (2)

tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: build-pr
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2

🧰 Additional context used

📓 Path-based instructions (2)

tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing

Files:

tests/e2e/features/tls.feature
tests/e2e/features/proxy.feature
tests/e2e/utils/prow_utils.py
tests/e2e/features/steps/health.py
tests/e2e/features/environment.py

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

tests/e2e/utils/prow_utils.py
tests/e2e/features/steps/health.py
tests/e2e/features/environment.py

🧠 Learnings (8)

📚 Learning: 2026-02-19T10:06:50.647Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:50.647Z
Learning: In the rhoai tests under tests/e2e-prow/rhoai/manifests, avoid static ConfigMap definitions for mock-jwks-script and mcp-mock-server-script since these ConfigMaps are created dynamically by the pipeline.sh deployment script using 'oc create configmap'. Ensure there are no static ConfigMap resources for these names in the manifests. If such ConfigMaps are added in the future, coordinate with the pipeline to reflect dynamic creation or adjust tests to rely on the dynamic provisioning.

Applied to files:

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml

📚 Learning: 2025-09-02T11:09:40.404Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

tests/e2e/features/proxy.feature

📚 Learning: 2025-09-02T11:15:02.411Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

tests/e2e/features/proxy.feature

📚 Learning: 2026-02-19T10:06:58.708Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:58.708Z
Learning: In tests/e2e-prow/rhoai/, ConfigMaps like mock-jwks-script and mcp-mock-server-script are created dynamically by the pipeline.sh deployment script using `oc create configmap` commands, rather than being defined as static ConfigMap resources in the manifest YAML files.

Applied to files:

tests/e2e/utils/prow_utils.py
tests/e2e-prow/rhoai/scripts/bootstrap.sh
tests/e2e-prow/rhoai/pipeline-services.sh
tests/e2e-prow/rhoai/scripts/e2e-ops.sh
tests/e2e-prow/rhoai/pipeline.sh

📚 Learning: 2026-04-13T13:39:54.963Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

tests/e2e/features/steps/health.py
tests/e2e/features/environment.py

📚 Learning: 2026-04-07T09:20:26.590Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

tests/e2e/features/steps/health.py
tests/e2e/features/environment.py

📚 Learning: 2026-04-19T15:40:25.624Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to tests/e2e/**/*.{py,feature} : Use behave (BDD) framework for end-to-end testing

Applied to files:

tests/e2e/features/environment.py

📚 Learning: 2025-08-18T10:56:55.349Z

Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

tests/e2e-prow/rhoai/pipeline.sh

🪛 Checkov (3.2.525)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml

[low] 7-205: CPU limits should be set

(CKV_K8S_11)

[low] 7-205: CPU requests should be set

(CKV_K8S_10)

[low] 7-205: The default namespace should not be used

(CKV_K8S_21)

[low] 7-205: Image should use digest

(CKV_K8S_43)

[low] 7-205: Image Tag should be fixed - not latest or blank

(CKV_K8S_14)

[low] 7-205: Memory limits should be set

(CKV_K8S_13)

[low] 7-205: Memory requests should be set

(CKV_K8S_12)

[low] 7-205: Use read-only filesystem for containers where possible

(CKV_K8S_22)

[low] 7-205: Containers should run as a high UID to avoid host conflict

(CKV_K8S_40)

[low] 7-205: Prefer using secrets as files over secrets as environment variables

(CKV_K8S_35)

[low] 7-205: Ensure that Service Account Tokens are only mounted where necessary

(CKV_K8S_38)

🪛 Trivy (0.69.3)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml

[info] 72-193: CPU not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)

[info] 48-70: CPU not limited

Container 'materialize-run-yaml' of Pod 'llama-stack-service' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)

[info] 18-47: CPU not limited

Container 'setup-rag-data' of Pod 'llama-stack-service' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)

[warning] 72-193: Image tag ":latest" used

Container 'llama-stack-container' of Pod 'llama-stack-service' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)

[warning] 48-70: Image tag ":latest" used

Container 'materialize-run-yaml' of Pod 'llama-stack-service' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)

[warning] 18-47: Image tag ":latest" used

Container 'setup-rag-data' of Pod 'llama-stack-service' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)

[error] 72-193: Root file system is not read-only

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[error] 48-70: Root file system is not read-only

Container 'materialize-run-yaml' of Pod 'llama-stack-service' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[error] 18-47: Root file system is not read-only

Container 'setup-rag-data' of Pod 'llama-stack-service' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[info] 72-193: CPU requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)

[info] 48-70: CPU requests not specified

Container 'materialize-run-yaml' of Pod 'llama-stack-service' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)

[info] 18-47: CPU requests not specified

Container 'setup-rag-data' of Pod 'llama-stack-service' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)

[info] 72-193: Memory requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)

[info] 48-70: Memory requests not specified

Container 'materialize-run-yaml' of Pod 'llama-stack-service' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)

[info] 18-47: Memory requests not specified

Container 'setup-rag-data' of Pod 'llama-stack-service' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)

[info] 72-193: Memory not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)

[info] 48-70: Memory not limited

Container 'materialize-run-yaml' of Pod 'llama-stack-service' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)

[info] 18-47: Memory not limited

Container 'setup-rag-data' of Pod 'llama-stack-service' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)

[info] 72-193: Runs with UID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)

[info] 72-193: Runs with GID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)

[info] 48-70: Runs with GID <= 10000

Container 'materialize-run-yaml' of Pod 'llama-stack-service' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)

[info] 18-47: Runs with GID <= 10000

Container 'setup-rag-data' of Pod 'llama-stack-service' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)

[info] 9-12: Workloads in the default namespace

pod llama-stack-service in default namespace should set metadata.namespace to a non-default namespace

Rule: KSV-0110

Learn more

(IaC/Kubernetes)

🔇 Additional comments (5)

tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1)

31-32: KV RAG path wiring looks good.

The new KV_RAG_PATH env var is correctly scoped to the Lightspeed container and matches the intended RAG path usage.

tests/e2e/features/proxy.feature (1)

1-1: Prow skip tagging is correctly applied.

Adding @skip-in-prow at feature scope is appropriate for scenarios requiring services not present in Prow.

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml (1)

27-27: max-model-len update is consistent with the runtime adjustment.

No issues in this changed segment.

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1)

27-27: CPU runtime max length update is aligned with related manifest changes.

No issues in this changed segment.

tests/e2e/features/tls.feature (1)

1-1: Feature-level Prow gating is correctly added.

The new @skip-in-prow tag cleanly excludes TLS scenarios in environments that lack required compose services.

coderabbitai · 2026-04-28T12:43:59Z

+          mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files /data/.e2e-rag-seed
+          if [ ! -f /rag-data/kv_store.db.gz ]; then
+            echo "FATAL: missing /rag-data/kv_store.db.gz"
+            ls -la /rag-data || true
+            exit 1
+          fi
+          gunzip -c /rag-data/kv_store.db.gz > /data/.e2e-rag-seed/kv_store.db
+          cp -f /data/.e2e-rag-seed/kv_store.db /data/src/.llama/storage/rag/kv_store.db
+          chmod -R 777 /data/src /data/.e2e-rag-seed


⚠️ Potential issue | 🟡 Minor

Seed the shared volume at its mounted root, not under /data/src/.llama/storage.

rag-storage is mounted as /data in this init container but as /opt/app-root/src/.llama/storage in the main container. Writing to /data/src/.llama/storage/... therefore lands at /opt/app-root/src/.llama/storage/src/.llama/storage/..., so the init container never prepopulates the path that KV_RAG_PATH points to.

Suggested fix

- mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files /data/.e2e-rag-seed + mkdir -p /data/rag /data/files /data/.e2e-rag-seed ... - cp -f /data/.e2e-rag-seed/kv_store.db /data/src/.llama/storage/rag/kv_store.db - chmod -R 777 /data/src /data/.e2e-rag-seed + cp -f /data/.e2e-rag-seed/kv_store.db /data/rag/kv_store.db + chmod -R 777 /data/rag /data/files /data/.e2e-rag-seed

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files /data/.e2e-rag-seed

if [ ! -f /rag-data/kv_store.db.gz ]; then

echo "FATAL: missing /rag-data/kv_store.db.gz"

ls -la /rag-data || true

exit 1

fi

gunzip -c /rag-data/kv_store.db.gz > /data/.e2e-rag-seed/kv_store.db

cp -f /data/.e2e-rag-seed/kv_store.db /data/src/.llama/storage/rag/kv_store.db

chmod -R 777 /data/src /data/.e2e-rag-seed

mkdir -p /data/rag /data/files /data/.e2e-rag-seed

if [ ! -f /rag-data/kv_store.db.gz ]; then

echo "FATAL: missing /rag-data/kv_store.db.gz"

ls -la /rag-data || true

exit 1

fi

gunzip -c /rag-data/kv_store.db.gz > /data/.e2e-rag-seed/kv_store.db

cp -f /data/.e2e-rag-seed/kv_store.db /data/rag/kv_store.db

chmod -R 777 /data/rag /data/files /data/.e2e-rag-seed

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml` around lines 33 - 41, The init script is seeding under /data/src/.llama/storage which ends up nested in the main container; instead seed the shared mount root (/data) so the main container's KV_RAG_PATH sees the files. Update the mkdir, gunzip/cp and chmod invocations to create and write to /data/.llama/storage/rag and /data/.llama/storage/files (not /data/src/...), e.g. ensure the script uses /rag-data/kv_store.db.gz -> /data/.e2e-rag-seed/kv_store.db and then copies that file to /data/.llama/storage/rag/kv_store.db, and adjust the final chmod to operate on /data and /data/.e2e-rag-seed so the prepopulated path matches KV_RAG_PATH.

coderabbitai · 2026-04-28T12:43:59Z

+NAMESPACE="${NAMESPACE:-e2e-rhoai-dsc}"

-# Deploy llama-stack
-envsubst < "$BASE_DIR/manifests/lightspeed/llama-stack.yaml" | oc apply -f -
+# Create llama-stack-ip-secret before deploying the pod (it references the secret as an env var)
+export E2E_LLAMA_HOSTNAME="llama-stack-service-svc.${NAMESPACE}.svc.cluster.local"
+oc create secret generic llama-stack-ip-secret \
+    --from-literal=key="$E2E_LLAMA_HOSTNAME" \
+    -n "$NAMESPACE" 2>/dev/null || echo "Secret llama-stack-ip-secret exists"
+
+# Deploy llama-stack (substitute only LLAMA_STACK_IMAGE, leave other ${} intact)
+envsubst '${LLAMA_STACK_IMAGE}' < "$BASE_DIR/manifests/lightspeed/llama-stack-prow.yaml" | oc apply -n "$NAMESPACE" -f -

 oc wait pod/llama-stack-service \
-  -n e2e-rhoai-dsc --for=condition=Ready --timeout=600s
+  -n "$NAMESPACE" --for=condition=Ready --timeout=600s

-# Get url address of llama-stack pod
-oc label pod llama-stack-service pod=llama-stack-service -n e2e-rhoai-dsc
+# Expose llama-stack service
+oc label pod llama-stack-service pod=llama-stack-service -n "$NAMESPACE"

 oc expose pod llama-stack-service \
  --name=llama-stack-service-svc \
  --port=8321 \
  --type=ClusterIP \
-  -n e2e-rhoai-dsc
-
-export E2E_LLAMA_HOSTNAME="llama-stack-service-svc.e2e-rhoai-dsc.svc.cluster.local"
-
-oc create secret generic llama-stack-ip-secret \
-    --from-literal=key="$E2E_LLAMA_HOSTNAME" \
-    -n e2e-rhoai-dsc || echo "Secret exists"
+  -n "$NAMESPACE"

-# Deploy lightspeed-stack
-oc apply -f "$BASE_DIR/manifests/lightspeed/lightspeed-stack.yaml"
+# Deploy lightspeed-stack (substitute only LIGHTSPEED_STACK_IMAGE, leave other ${} intact)
+LIGHTSPEED_STACK_IMAGE="${LIGHTSPEED_STACK_IMAGE:-quay.io/lightspeed-core/lightspeed-stack:dev-latest}"
+export LIGHTSPEED_STACK_IMAGE
+envsubst '${LIGHTSPEED_STACK_IMAGE}' < "$BASE_DIR/manifests/lightspeed/lightspeed-stack.yaml" | oc apply -n "$NAMESPACE" -f -


⚠️ Potential issue | 🟠 Major

Make this script fail fast on deployment errors.

This file still runs without set -euo pipefail, so a timed-out oc wait, failed oc expose, or failed manifest apply can be overwritten by a later successful command and pipeline.sh will continue with a half-deployed environment.

Suggested fix

#!/bin/bash +set -euo pipefail BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" NAMESPACE="${NAMESPACE:-e2e-rhoai-dsc}" +: "${LLAMA_STACK_IMAGE:?LLAMA_STACK_IMAGE must be set}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline-services.sh` around lines 4 - 30, Add fail-fast behavior by inserting set -euo pipefail near the top (immediately after the NAMESPACE assignment) so any failing command (envsubst | oc apply, oc wait pod/llama-stack-service, oc expose pod llama-stack-service, etc.) aborts the script; also stop swallowing errors on secret creation—replace the silent redirection and echo around oc create secret llama-stack-ip-secret with a check that only ignores "already exists" (e.g., try oc create and on failure verify oc get secret llama-stack-ip-secret returns success, otherwise exit) so unexpected failures surface immediately.

coderabbitai · 2026-04-28T12:43:59Z

+# Kill any existing processes on ports 8080, 8000, and 8321 (lsof may be missing in minimal images)
+echo "Checking for existing processes on ports 8080, 8000, and 8321..."
+if command -v lsof >/dev/null 2>&1; then
+    lsof -ti:8080 | xargs kill -9 2>/dev/null || true
+    lsof -ti:8000 | xargs kill -9 2>/dev/null || true
+    lsof -ti:8321 | xargs kill -9 2>/dev/null || true
+elif command -v fuser >/dev/null 2>&1; then
+    fuser -k 8080/tcp 2>/dev/null || true
+    fuser -k 8000/tcp 2>/dev/null || true
+    fuser -k 8321/tcp 2>/dev/null || true
+fi


⚠️ Potential issue | 🟠 Major

Keep the /proc fallback here too.

This cleanup only uses lsof/fuser, but the comment already calls out minimal images. If neither tool exists, stale oc port-forward listeners survive and the new binds on 8080, 8000, or 8321 can keep failing with “address already in use”. Reuse the /proc-based cleanup from tests/e2e-prow/rhoai/scripts/e2e-ops.sh here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline.sh` around lines 359 - 369, The port-cleanup block only tries lsof and fuser and needs the /proc fallback used in tests/e2e-prow/rhoai/scripts/e2e-ops.sh to catch stale oc port-forward listeners; update the cleanup in pipeline.sh (the block that checks command -v lsof / command -v fuser and kills ports 8080, 8000, 8321) to add the /proc-based scan as the final elif/else branch that iterates /proc/*/fd, resolves socket links to find LISTEN sockets bound to those ports and kills the owning PIDs, mirroring the logic from e2e-ops.sh so binds won’t fail when lsof/fuser are absent.

coderabbitai · 2026-04-28T12:43:59Z

 # Start port-forward for mock-jwks (needed for RBAC tests to get tokens)
 echo "Starting port-forward for mock-jwks..."
 oc port-forward svc/mock-jwks 8000:8000 -n $NAMESPACE &
 PF_JWKS_PID=$!
+echo "$PF_JWKS_PID" >"$E2E_JWKS_PORT_FORWARD_PID_FILE"
+
+# Behave steps that call Llama Stack directly (MCP toolgroups, shields, disrupt/restore)
+# need localhost:8321. Without this forward those tests hit "Connection refused".
+echo "Starting port-forward for llama-stack..."
+oc port-forward svc/llama-stack-service-svc 8321:8321 -n $NAMESPACE &
+PF_LLAMA_PID=$!
+echo "$PF_LLAMA_PID" >"$E2E_LLAMA_PORT_FORWARD_PID_FILE"


⚠️ Potential issue | 🟠 Major

Validate the initial JWKS forward before starting tests.

Unlike the Lightspeed and Llama forwards, this one is never checked after startup. If oc port-forward svc/mock-jwks exits immediately or /tokens is not reachable yet, the first RBAC scenario still flakes before e2e-ops.sh gets a chance to repair it. Add a /tokens wait loop here or call restart-jwks-port-forward immediately after spawning the forward.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/pipeline.sh` around lines 377 - 388, The JWKS port-forward started into PF_JWKS_PID (and persisted to E2E_JWKS_PORT_FORWARD_PID_FILE) is not validated before tests run; add a readiness check after spawning oc port-forward svc/mock-jwks by polling localhost:8000/tokens (or invoking the existing restart-jwks-port-forward helper) until it returns success or a timeout, and only proceed once /tokens is reachable; ensure the health loop uses PF_JWKS_PID to detect if the background process died and call restart-jwks-port-forward if needed so the RBAC tests won't start against an unavailable JWKS endpoint.

coderabbitai · 2026-04-28T12:43:59Z

+sleep 5
+sleep 10


🧹 Nitpick | 🔵 Trivial

Collapse consecutive sleeps into one for clarity.

Line 95 and Line 96 are equivalent to a single sleep 15, which is easier to read and maintain.

Proposed cleanup

-sleep 5 -sleep 10 +sleep 15

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sleep 5

sleep 10

sleep 15

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/scripts/bootstrap.sh` around lines 95 - 96, The script contains two consecutive sleep commands ("sleep 5" followed by "sleep 10") which should be collapsed into a single sleep to improve clarity; replace the two lines with one "sleep 15" in the bootstrap.sh script so the total delay is preserved while simplifying the code.

@are-ces, sure! Here's the plan:

Collapse the two consecutive sleep calls into a single sleep 15 in bootstrap.sh

Applying DataScienceCluster from ds-cluster.yaml..."
oc apply -f "$BASE_DIR/manifests/operators/ds-cluster.yaml"
-sleep 5
-sleep 10
+sleep 15

</requirement> --> --- <details> <summary>🧠 Learnings used</summary>

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:58.708Z
Learning: In tests/e2e-prow/rhoai/, ConfigMaps like mock-jwks-script and mcp-mock-server-script are created dynamically by the pipeline.sh deployment script using oc create configmap commands, rather than being defined as static ConfigMap resources in the manifest YAML files.

</details> ---  - [ ]  📌 Commit the changes to this branch

coderabbitai · 2026-04-28T12:43:59Z

+    if [[ -f "$E2E_JWKS_PORT_FORWARD_PID_FILE" ]]; then
+        local saved_pf
+        read -r saved_pf <"$E2E_JWKS_PORT_FORWARD_PID_FILE" 2>/dev/null || true
+        if [[ "$saved_pf" =~ ^[0-9]+$ ]] && kill -0 "$saved_pf" 2>/dev/null; then
+            local http_code
+            http_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 3 "http://127.0.0.1:$local_port/tokens" 2>/dev/null) || http_code="000"
+            if [[ "$http_code" != "000" ]]; then
+                echo "✓ Mock JWKS port-forward already healthy (PID: $saved_pf)"
+                return 0
+            fi
+        fi
+    fi
+
+    echo "Re-establishing mock-jwks port-forward on $local_port:$remote_port..."
+
+    for ((attempt=1; attempt<=max_attempts; attempt++)); do
+        kill_stale_jwks_forward "$local_port"
+        sleep 2
+
+        echo "JWKS port-forward attempt $attempt/$max_attempts"
+
+        : >"$jwks_pf_log"
+        nohup oc port-forward svc/mock-jwks "$local_port:$remote_port" -n "$NAMESPACE" \
+            </dev/null >"$jwks_pf_log" 2>&1 &
+        pf_pid=$!
+        disown "$pf_pid" 2>/dev/null || true
+        sleep 3
+
+        if ! kill -0 "$pf_pid" 2>/dev/null; then
+            echo "JWKS port-forward process exited immediately"
+            continue
+        fi
+
+        local http_code
+        http_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 "http://127.0.0.1:$local_port/tokens" 2>/dev/null) || http_code="000"
+        if [[ "$http_code" != "000" ]]; then
+            echo "$pf_pid" >"$E2E_JWKS_PORT_FORWARD_PID_FILE"
+            echo "✓ Mock JWKS port-forward established (PID: $pf_pid)"
+            return 0
+        fi


⚠️ Potential issue | 🟠 Major

Require /tokens to succeed, not merely respond.

Both checks here treat any non-000 HTTP code as “healthy”. That lets a 404 or 500 on mock-jwks pass restart logic even though RBAC steps still cannot mint tokens.

Suggested fix

- if [[ "$http_code" != "000" ]]; then + if [[ "$http_code" == "200" ]]; then echo "✓ Mock JWKS port-forward already healthy (PID: $saved_pf)" return 0 fi ... - if [[ "$http_code" != "000" ]]; then + if [[ "$http_code" == "200" ]]; then echo "$pf_pid" >"$E2E_JWKS_PORT_FORWARD_PID_FILE" echo "✓ Mock JWKS port-forward established (PID: $pf_pid)" return 0 fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ -f "$E2E_JWKS_PORT_FORWARD_PID_FILE" ]]; then

local saved_pf

read -r saved_pf <"$E2E_JWKS_PORT_FORWARD_PID_FILE" 2>/dev/null || true

if [[ "$saved_pf" =~ ^[0-9]+$ ]] && kill -0 "$saved_pf" 2>/dev/null; then

local http_code

http_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 3 "http://127.0.0.1:$local_port/tokens" 2>/dev/null) || http_code="000"

if [[ "$http_code" != "000" ]]; then

echo "✓ Mock JWKS port-forward already healthy (PID: $saved_pf)"

return 0

fi

fi

fi

echo "Re-establishing mock-jwks port-forward on $local_port:$remote_port..."

for ((attempt=1; attempt<=max_attempts; attempt++)); do

kill_stale_jwks_forward "$local_port"

sleep 2

echo "JWKS port-forward attempt $attempt/$max_attempts"

: >"$jwks_pf_log"

nohup oc port-forward svc/mock-jwks "$local_port:$remote_port" -n "$NAMESPACE" \

</dev/null >"$jwks_pf_log" 2>&1 &

pf_pid=$!

disown "$pf_pid" 2>/dev/null || true

sleep 3

if ! kill -0 "$pf_pid" 2>/dev/null; then

echo "JWKS port-forward process exited immediately"

continue

fi

local http_code

http_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 "http://127.0.0.1:$local_port/tokens" 2>/dev/null) || http_code="000"

if [[ "$http_code" != "000" ]]; then

echo "$pf_pid" >"$E2E_JWKS_PORT_FORWARD_PID_FILE"

echo "✓ Mock JWKS port-forward established (PID: $pf_pid)"

return 0

fi

if [[ -f "$E2E_JWKS_PORT_FORWARD_PID_FILE" ]]; then

local saved_pf

read -r saved_pf <"$E2E_JWKS_PORT_FORWARD_PID_FILE" 2>/dev/null || true

if [[ "$saved_pf" =~ ^[0-9]+$ ]] && kill -0 "$saved_pf" 2>/dev/null; then

local http_code

http_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 3 "http://127.0.0.1:$local_port/tokens" 2>/dev/null) || http_code="000"

if [[ "$http_code" == "200" ]]; then

echo "✓ Mock JWKS port-forward already healthy (PID: $saved_pf)"

return 0

fi

fi

fi

echo "Re-establishing mock-jwks port-forward on $local_port:$remote_port..."

for ((attempt=1; attempt<=max_attempts; attempt++)); do

kill_stale_jwks_forward "$local_port"

sleep 2

echo "JWKS port-forward attempt $attempt/$max_attempts"

: >"$jwks_pf_log"

nohup oc port-forward svc/mock-jwks "$local_port:$remote_port" -n "$NAMESPACE" \

</dev/null >"$jwks_pf_log" 2>&1 &

pf_pid=$!

disown "$pf_pid" 2>/dev/null || true

sleep 3

if ! kill -0 "$pf_pid" 2>/dev/null; then

echo "JWKS port-forward process exited immediately"

continue

fi

local http_code

http_code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 "http://127.0.0.1:$local_port/tokens" 2>/dev/null) || http_code="000"

if [[ "$http_code" == "200" ]]; then

echo "$pf_pid" >"$E2E_JWKS_PORT_FORWARD_PID_FILE"

echo "✓ Mock JWKS port-forward established (PID: $pf_pid)"

return 0

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/scripts/e2e-ops.sh` around lines 505 - 544, The health checks treat any non-"000" curl result as healthy; change the logic that examines the curl http_code (the two occurrences that compare http_code != "000" when checking "http://127.0.0.1:$local_port/tokens") to require a successful response (e.g. http_code == "200" or at minimum a 2xx check) before claiming the JWKS port-forward is healthy and before writing pf_pid to E2E_JWKS_PORT_FORWARD_PID_FILE; adjust the checks near the variables E2E_JWKS_PORT_FORWARD_PID_FILE, jwks_pf_log, pf_pid and the curl invocations so only a true success (200/2xx) allows early return or saving the PID.

coderabbitai · 2026-04-28T12:43:59Z

+        print(f"update-configmap stdout: {result.stdout}")
+        print(f"update-configmap stderr: {result.stderr}")


🛠️ Refactor suggestion | 🟠 Major

Use module logger instead of print for new failure diagnostics.

Line 208 and Line 209 should use structured logging to stay consistent with repo logging standards.

Proposed change

+from log import get_logger + +logger = get_logger(__name__) @@ - print(f"update-configmap stdout: {result.stdout}") - print(f"update-configmap stderr: {result.stderr}") + logger.error("update-configmap stdout: %s", result.stdout) + logger.error("update-configmap stderr: %s", result.stderr)

As per coding guidelines, "Use logger = get_logger(__name__) from log.py for module logging".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"update-configmap stdout: {result.stdout}")

print(f"update-configmap stderr: {result.stderr}")

from log import get_logger

logger = get_logger(__name__)

# ... existing code ...

logger.error("update-configmap stdout: %s", result.stdout)

logger.error("update-configmap stderr: %s", result.stderr)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/utils/prow_utils.py` around lines 208 - 209, Replace the two print calls used for update-configmap diagnostics with the module logger: add "logger = get_logger(__name__)" (importing get_logger from log.py) near the top of prow_utils.py, then change the prints that reference result.stdout and result.stderr to structured logger calls (e.g., logger.debug or logger.info with format placeholders) so the messages use the module logger and pass result.stdout/result.stderr as parameters instead of using print.

are-ces · 2026-04-28T12:45:43Z

@radofuchs we should check whether the Konflux e2e tests run, there are large updates to e2e_ops.sh

Behave clears custom context attributes between scenarios, so hostname_llama/port_llama are gone by the time after_feature runs. Store them in module-level state (same pattern as llama_stack_was_running). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/features/environment.py`:
- Around line 446-451: The restore logic now runs unconditionally when
get_llama_stack_was_running() is true, which can attempt to restore a
llama-stack in library-mode runs; update the condition to also check that we're
not in library mode by reintroducing the context.is_library_mode guard (i.e.,
only call _restore_llama_stack() and reset_llama_stack_was_running() when
get_llama_stack_was_running() is true AND context.is_library_mode is False),
locating the check around get_llama_stack_was_running(), _restore_llama_stack,
and reset_llama_stack_was_running to ensure teardown skips llama-stack
restoration in library-mode scenarios.
- Around line 159-183: The hook currently logs failures from both
run_e2e_ops("restart-port-forward") and run_e2e_ops("restart-lightspeed") but
continues; change before_scenario so that after both recovery attempts either
re-probe the service readiness (e.g., call the existing readiness check or
run_e2e_ops("check-readiness")/similar) and if that probe still fails or any
restart timed out/returned non-zero, raise an exception (or call sys.exit) to
hard-fail the scenario; update the blocks around run_e2e_ops to capture
timeouts/non-zero return codes and, if recovery did not succeed, raise a
RuntimeError with a clear message so the scenario stops immediately.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a55c71f3-961c-4ae1-b1d9-058d815b1cba

📥 Commits

Reviewing files that changed from the base of the PR and between b033f92 and aa6ab2e.

📒 Files selected for processing (2)

tests/e2e/features/environment.py
tests/e2e/features/steps/common.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: build-pr
GitHub Check: radon
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

tests/e2e/features/steps/common.py
tests/e2e/features/environment.py

tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing

Files:

tests/e2e/features/steps/common.py
tests/e2e/features/environment.py

🧠 Learnings (3)

📚 Learning: 2026-04-13T13:39:54.963Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

tests/e2e/features/steps/common.py
tests/e2e/features/environment.py

📚 Learning: 2026-04-07T09:20:26.590Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

tests/e2e/features/steps/common.py
tests/e2e/features/environment.py

📚 Learning: 2026-04-19T15:40:25.624Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to tests/e2e/**/*.{py,feature} : Use behave (BDD) framework for end-to-end testing

Applied to files:

tests/e2e/features/environment.py

🔇 Additional comments (1)

tests/e2e/features/steps/common.py (1)

24-41: Persisting the Llama endpoint outside context is the right fix.

This keeps the teardown inputs alive after Behave drops scenario-scoped attributes from context.

Based on learnings, step-set Behave context attributes are cleared after each scenario, so teardown data that must survive to after_feature cannot live only on context.

Also applies to: 63-64

coderabbitai · 2026-04-28T13:56:29Z

+    print("[before_scenario] Port-forward appears dead, restarting...")
+    try:
+        result = run_e2e_ops("restart-port-forward", timeout=60)
+        print(result.stdout, end="")
+        if result.returncode == 0:
+            print("[before_scenario] Port-forward re-established")
+            return
+        print(result.stderr, end="")
+    except subprocess.TimeoutExpired:
+        pass
+
+    # Port-forward alone failed — the pod itself may be dead (e.g. Llama Stack
+    # was never restored after a disruption feature). Attempt a full restart,
+    # which also checks Llama health before recreating LCS.
+    print("[before_scenario] Port-forward failed; attempting full pod restart...")
+    try:
+        result = run_e2e_ops("restart-lightspeed", timeout=200)
+        print(result.stdout, end="")
+        if result.returncode != 0:
+            print(result.stderr, end="")
+            print("[before_scenario] Warning: full pod restart failed")
+        else:
+            print("[before_scenario] Pod restart + port-forward re-established")
+    except subprocess.TimeoutExpired:
+        print("[before_scenario] Warning: full pod restart timed out")


⚠️ Potential issue | 🟠 Major

Don’t continue the scenario after both port-forward recovery paths fail.

When Line 161 and Line 175 both fail or time out, this hook only logs and returns. The scenario then runs against a known-dead endpoint and fails later in unrelated steps, which makes CI triage much harder. Re-probe readiness and raise here if recovery did not succeed.

💡 Suggested hard-fail after unsuccessful recovery

except subprocess.TimeoutExpired: print("[before_scenario] Warning: full pod restart timed out") + + try: + resp = requests.get(url, timeout=5) + if resp.status_code not in (200, 401): + raise AssertionError( + f"Lightspeed readiness check still failing after recovery: {resp.status_code}" + ) + except requests.RequestException as exc: + raise AssertionError( + "Lightspeed port-forward could not be recovered before the scenario" + ) from exc

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/environment.py` around lines 159 - 183, The hook currently logs failures from both run_e2e_ops("restart-port-forward") and run_e2e_ops("restart-lightspeed") but continues; change before_scenario so that after both recovery attempts either re-probe the service readiness (e.g., call the existing readiness check or run_e2e_ops("check-readiness")/similar) and if that probe still fails or any restart timed out/returned non-zero, raise an exception (or call sys.exit) to hard-fail the scenario; update the blocks around run_e2e_ops to capture timeouts/non-zero return codes and, if recovery did not succeed, raise a RuntimeError with a clear message so the scenario stops immediately.

coderabbitai · 2026-04-28T13:56:29Z

+    # Restore Llama Stack FIRST (before any lightspeed-stack restart).
+    # Read from module-level state — Behave clears custom context attributes
+    # between scenarios, so context.llama_stack_was_running is unreliable here.
+    if get_llama_stack_was_running():
+        _restore_llama_stack()
+        reset_llama_stack_was_running()


⚠️ Potential issue | 🟠 Major

Keep the library-mode guard around Llama Stack restoration.

Line 449 now restores whenever get_llama_stack_was_running() is true, but this file explicitly treats library mode as having no separate llama-stack runtime. Dropping the not context.is_library_mode check can make feature teardown try to restore a container/pod that should not exist in library runs.

💡 Suggested guard restoration

- if get_llama_stack_was_running(): + if not context.is_library_mode and get_llama_stack_was_running(): _restore_llama_stack() reset_llama_stack_was_running()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Restore Llama Stack FIRST (before any lightspeed-stack restart).

# Read from module-level state — Behave clears custom context attributes

# between scenarios, so context.llama_stack_was_running is unreliable here.

if get_llama_stack_was_running():

_restore_llama_stack()

reset_llama_stack_was_running()

# Restore Llama Stack FIRST (before any lightspeed-stack restart).

# Read from module-level state — Behave clears custom context attributes

# between scenarios, so context.llama_stack_was_running is unreliable here.

if not context.is_library_mode and get_llama_stack_was_running():

_restore_llama_stack()

reset_llama_stack_was_running()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/environment.py` around lines 446 - 451, The restore logic now runs unconditionally when get_llama_stack_was_running() is true, which can attempt to restore a llama-stack in library-mode runs; update the condition to also check that we're not in library mode by reintroducing the context.is_library_mode guard (i.e., only call _restore_llama_stack() and reset_llama_stack_was_running() when get_llama_stack_was_running() is true AND context.is_library_mode is False), locating the check around get_llama_stack_was_running(), _restore_llama_stack, and reset_llama_stack_was_running to ensure teardown skips llama-stack restoration in library-mode scenarios.

radofuchs

LGTM

Radovan Fuchs and others added 17 commits April 23, 2026 13:37

bump rhoai image version in prow tests

340c005

Fix missing closing brace in LLAMA_STACK_IMAGE parameter expansion

a53c2e1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add namespace diagnostic logging to prow pipeline

dc88586

Add DEBUG NS checkpoints to trace when e2e-rhoai-dsc namespace disappears during operator bootstrapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove namespace from cluster-scoped DataScienceCluster CR

8aaee73

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add model/provider override env vars to prow pipeline

6bc8998

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Hardcode NAMESPACE to avoid Prow env override

c317499

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Re-enable secret exports in prow pipeline

4e2a4bd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

are-ces requested a review from radofuchs April 28, 2026 12:45

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

radofuchs approved these changes Apr 29, 2026

View reviewed changes

radofuchs merged commit 154f2f1 into lightspeed-core:main Apr 29, 2026
30 checks passed

		print(f"update-configmap stdout: {result.stdout}")
		print(f"update-configmap stderr: {result.stderr}")

-        print(f"update-configmap stdout: {result.stdout}")
-        print(f"update-configmap stderr: {result.stderr}")
+from log import get_logger
+logger = get_logger(__name__)
+# ... existing code ...
+        logger.error("update-configmap stdout: %s", result.stdout)
+        logger.error("update-configmap stderr: %s", result.stderr)

Conversation

are-ces commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

are-ces Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

are-ces commented Apr 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

radofuchs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

are-ces commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

coderabbitai Bot Apr 28, 2026 •

edited

Loading